AITopics | critical parameter

Collaborating Authors

critical parameter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DataStealing

Neural Information Processing SystemsFeb-18-2026, 15:29:41 GMT

Federated Learning (FL) iscommonly used tocollaborativelytrain models with privacypreservation. Specifically,AdaSCP evaluates the importance of parameters with the gradients in dominant timesteps of the diffusion model.

artificial intelligence, diffusion model, machine learning, (18 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Parameter Importance-Driven Continual Learning for Foundation Models

Wang, Lingxiang, Zhang, Hainan, Zheng, Zhiming

arXiv.org Artificial IntelligenceNov-20-2025

Domain-specific post-training often causes catastrophic forgetting, making foundation models lose their general reasoning ability and limiting their adaptability to dynamic real-world environments. Preserving general capabilities while acquiring downstream domain knowledge is a central challenge for large language and multimodal models. Traditional continual learning methods, such as regularization, replay and architectural isolation, suffer from poor downstream performance, reliance on inaccessible historical data, or additional parameter overhead. While recent parameter-efficient tuning (PET) methods can alleviate forgetting, their effectiveness strongly depends on the choice of parameters and update strategies. In this paper, we introduce PIECE, a Parameter Importance Estimation-based Continual Enhancement method that preserves general ability while efficiently learning domain knowledge without accessing prior training data or increasing model parameters. PIECE selectively updates only 0.1% of core parameters most relevant to new tasks, guided by two importance estimators: PIECE-F based on Fisher Information, and PIECE-S based on a second-order normalization that combines gradient and curvature information. Experiments across three language models and two multimodal models show that PIECE maintains general capabilities and achieves state-of-the-art continual learning performance across diverse downstream tasks. Our results highlight a practical path to scalable, domain-adaptive foundation models without catastrophic forgetting.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.15375

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Where to Search: Measure the Prior-Structured Search Space of LLM Agents

Song, Zhuo-Yang

arXiv.org Artificial IntelligenceNov-4-2025

The generate-filter-refine (iterative paradigm) based on large language models (LLMs) has achieved progress in reasoning, programming, and program discovery in AI+Science. However, the effectiveness of search depends on where to search, namely, how to encode the domain prior into an operationally structured hypothesis space. To this end, this paper proposes a compact formal theory that describes and measures LLM-assisted iterative search guided by domain priors. We represent an agent as a fuzzy relation operator on inputs and outputs to capture feasible transitions; the agent is thereby constrained by a fixed safety envelope. To describe multi-step reasoning/search, we weight all reachable paths by a single continuation parameter and sum them to obtain a coverage generating function; this induces a measure of reachability difficulty; and it provides a geometric interpretation of search on the graph induced by the safety envelope. We further provide the simplest testable inferences and validate them via two instantiation. This theory offers a workable language and operational tools to measure agents and their search spaces, proposing a systematic formal description of iterative search constructed by LLMs.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.14846

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

FedPURIN: Programmed Update and Reduced INformation for Sparse Personalized Federated Learning

Xie, Lunchen, He, Zehua, Shi, Qingjiang

arXiv.org Artificial IntelligenceOct-21-2025

Personalized Federated Learning (PFL) has emerged as a critical research frontier addressing data heterogeneity issue across distributed clients. Novel model architectures and collaboration mechanisms are engineered to accommodate statistical disparities while producing client-specific models. Parameter decoupling represents a promising paradigm for maintaining model performance in PFL frameworks. However, the communication efficiency of many existing methods remains suboptimal, sustaining substantial communication burdens that impede practical deployment. To bridge this gap, we propose Federated Learning with Programmed Update and Reduced INformation (FedPURIN), a novel framework that strategically identifies critical parameters for transmission through an integer programming formulation. This mathematically grounded strategy is seamlessly integrated into a sparse aggregation scheme, achieving a significant communication reduction while preserving the efficacy. Comprehensive evaluations on standard image classification benchmarks under varied non-IID conditions demonstrate competitive performance relative to state-of-the-art methods, coupled with quantifiable communication reduction through sparse aggregation. The framework establishes a new paradigm for communication-efficient PFL, particularly advantageous for edge intelligence systems operating with heterogeneous data sources. Introduction Federated learning (FL), as a powerful distributed machine learning scheme, has been well studied to handle the growing trend towards harnessing abundant data on ubiquitous edge devices [1]. This framework has been successfully applied in various domains, including computer vision [2, 3], healthcare [4, 5], finance [6, 7], and ubiquitous IoT applications [8, 9, 10].

artificial intelligence, fedpurin, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2510.16065

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.86)

Industry: Health & Medicine (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans

Neural Information Processing SystemsOct-10-2025, 20:59:44 GMT

Parameters (AdaSCP) attack to circumvent the defenses and seamlessly incorporate malicious updates into the global model. Specifically, AdaSCP evaluates the importance of parameters with the gradients in dominant timesteps of the diffusion model. Subsequently, it adaptively seeks the optimal scale factor and magnifies critical parameter updates before uploading to the server. As a result, the malicious update becomes similar to the benign update, making it difficult for distance-based defenses to identify. Extensive experiments reveal the risk of leaking thousands of images in training diffusion models with FL.

diffusion model, experiment, indicator, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimizing Force Signals from Human Demonstrations of In-Contact Motions

Hartwig, Johannes, Viessmann, Fabian, Henrich, Dominik

arXiv.org Artificial IntelligenceAug-15-2025

For non-robot-programming experts, kinesthetic guiding can be an intuitive input method, as robot programming of in-contact tasks is becoming more prominent. However, imprecise and noisy input signals from human demonstrations pose problems when reproducing motions directly or using the signal as input for machine learning methods. This paper explores optimizing force signals to correspond better to the human intention of the demonstrated signal. We compare different signal filtering methods and propose a peak detection method for dealing with first-contact deviations in the signal. The evaluation of these methods considers a specialized error criterion between the input and the human-intended signal. In addition, we analyze the critical parameters' influence on the filtering methods. The quality for an individual motion could be increased by up to \SI{20}{\percent} concerning the error criterion. The proposed contribution can improve the usability of robot programming and the interaction between humans and robots.

artificial intelligence, critical parameter, demonstration, (16 more...)

arXiv.org Artificial Intelligence

2507.15608

Country: Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models

Zhang, Zhen, Yang, Yifan, Zhen, Kai, Susanj, Nathan, Mouchtaris, Athanasios, Kunzmann, Siegfried, Zhang, Zheng

arXiv.org Artificial IntelligenceFeb-17-2025

Large language models have demonstrated exceptional capabilities across diverse tasks, but their fine-tuning demands significant memory, posing challenges for resource-constrained environments. Zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating the need for backpropagation. However, ZO optimization suffers from high gradient variance, and prior research has largely focused on single-task learning, leaving its application to multi-task learning unexplored. Multi-task learning is crucial for leveraging shared knowledge across tasks to improve generalization, yet it introduces unique challenges under ZO settings, such as amplified gradient variance and collinearity. In this paper, we present MaZO, the first framework specifically designed for multi-task LLM fine-tuning under ZO optimization. MaZO tackles these challenges at the parameter level through two key innovations: a weight importance metric to identify critical parameters and a multi-task weight update mask to selectively update these parameters, reducing the dimensionality of the parameter space and mitigating task conflicts. Experiments demonstrate that MaZO achieves state-of-the-art performance, surpassing even multi-task learning methods designed for first-order optimization.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.11513

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

No Data, No Optimization: A Lightweight Method To Disrupt Neural Networks With Sign-Flips

Galil, Ido, Kimhi, Moshe, El-Yaniv, Ran

arXiv.org Artificial IntelligenceFeb-11-2025

Deep neural networks (DNNs) power a wide range of applications, including safety-critical tasks such as autonomous driving, unmanned aerial vehicle (UAV) navigation, medical diagnostics, and robotics, where real-time decision-making is essential. However, the increasing reliance on DNNs also raises concerns about their resilience to malicious attacks. Ensuring the robustness of DNNs is crucial to maintaining their reliability in such critical applications. In this paper, we expose a critical vulnerability in DNNs that allows for severe disruption by flipping as few as one to ten sign bits, a tiny fraction of the model's parameters. Our method demonstrates how a small number of bit flips, within models containing up to hundred millions of parameters, can cause catastrophic degradation in performance. We systematically analyze and identify the parameters most susceptible to sign flips, which we term "critical parameters."

artificial intelligence, machine learning, sign bit, (19 more...)

arXiv.org Artificial Intelligence

2502.07408

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AttentionBreaker: Adaptive Evolutionary Optimization for Unmasking Vulnerabilities in LLMs through Bit-Flip Attacks

Das, Sanjay, Bhattacharya, Swastik, Kundu, Souvik, Kundu, Shamik, Menon, Anand, Raha, Arnab, Basu, Kanad

arXiv.org Artificial IntelligenceNov-20-2024

Large Language Models (LLMs) have revolutionized natural language processing (NLP), excelling in tasks like text generation and summarization. However, their increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs). BFAs, enabled by fault injection methods such as Rowhammer, target model parameters in memory, compromising both integrity and performance. Identifying critical parameters for BFAs in the vast parameter space of LLMs poses significant challenges. While prior research suggests transformer-based architectures are inherently more robust to BFAs compared to traditional deep neural networks, we challenge this assumption. For the first time, we demonstrate that as few as three bit-flips can cause catastrophic performance degradation in an LLM with billions of parameters. Current BFA techniques are inadequate for exploiting this vulnerability due to the difficulty of efficiently identifying critical parameters within the immense parameter space. To address this, we propose AttentionBreaker, a novel framework tailored for LLMs that enables efficient traversal of the parameter space to identify critical parameters. Additionally, we introduce GenBFA, an evolutionary optimization strategy designed to refine the search further, isolating the most critical bits for an efficient and effective attack. Empirical results reveal the profound vulnerability of LLMs to AttentionBreaker. For example, merely three bit-flips (4.129 x 10^-9% of total parameters) in the LLaMA3-8B-Instruct 8-bit quantized (W8) model result in a complete performance collapse: accuracy on MMLU tasks drops from 67.3% to 0%, and Wikitext perplexity skyrockets from 12.6 to 4.72 x 10^5. These findings underscore the effectiveness of AttentionBreaker in uncovering and exploiting critical vulnerabilities within LLM architectures.

attentionbreaker, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2411.13757

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Choosing the parameter of the Fermat distance: navigating geometry and noise

Chazal, Frédéric, Ferraris, Laure, Groisman, Pablo, Jonckheere, Matthieu, Pascal, Frédéric, Sapienza, Facundo

arXiv.org Machine LearningNov-30-2023

The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $\alpha$ that greatly impacts the performance of subsequent tasks. Ideally, the value of $\alpha$ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. We study both theoretically and through simulations how to select this parameter.

artificial intelligence, fermat distance, machine learning, (18 more...)

arXiv.org Machine Learning

2311.18663

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback